runZinb <- T
runClus <- T
NCORES <- 7
EAP: small request. Can everyone put a line between the beginning of a r chunk and text? It makes it nicely formated for my text editor.
EAP: if we are going to ultimately use these functions, it would be good to add them to the package for zinbwave.
FP: yes, I agree. Davide, we wait for zinbwave to be on bioconductor before to push changes, right?
compute.zinb.loglik <- function(Y, zinb){
mu = t(getMu(zinb))
theta = getTheta(zinb)
theta_mat = matrix(rep(theta, ncol(Y), ncol = ncol(Y)))
pi = t(getPi(zinb))
log( pi * (Y == 0) + (1 - pi) * dnbinom(Y, size = theta, mu = mu) )
}
compute.deviance.residuals <- function(Y, zinb){
mu_hat = t(getMu(zinb))
pi_hat = t(getPi(zinb))
Y_hat = (1 - pi_hat) * mu_hat
ll = compute.zinb.loglik(Y, zinb)
sign = 1*(Y - Y_hat > 0)
sign[sign == 0] = -1
sign * sqrt(-2 * ll)
}
We propose a worklow to analyze single cell RNA-Seq with the following steps
Along the worflow, use deviance residuals as adjusted values.
knitr::include_graphics('../vignettes/workflow.png')
Worflow to analyze single cell RNASeq data
Along the workflow, we want to use a unique SummarizedExperiement object carrying all the data we need.
EAP: I have updated the code to pull from a dataset on the repos that is created with the createData.R file. For now, I am filtering to the top 1000 most variable genes there, though we might want to add that to the code for the article. This will be slightly different data from Russell’s, which didn’t use all of the samples. We can adjust that decision later, or just compare the samples that are the same. (Russell’s clusterLabels are in the meta data)
EAP: zinbFit doesn’t accept data.frame objects, so currently have to have a data.matrix command. Should it be changed so that it does?
#counts<-read.table("../data/oeCufflinkCountData.txt",sep="\t",header=TRUE)
core <- read.table("../data/oeCufflinkCountData_1000Var.txt",
sep = "\t", header = TRUE)
core <- data.matrix(core)
metadata <- read.table("../data/oeMetadata.txt", sep = "\t", header = TRUE)
# symbol for samples missing from original clustering
metadata$clusterLabels[is.na(metadata$clusterLabels)] <- -2
Here we only look at the 1000 most variable genes. EAP: see note above, I’ve commented out the filtering and added it to the createData.R for now.
batch <- metadata$Batch
Cells have been processed in 18 different batches
col_batch = rep(brewer.pal(9, "Set1"), 2)
names(col_batch) = unique(batch)
table(batch)
## batch
## GBC08A GBC08B GBC09A GBC09B P01 P02 P03A P03B P04 P05
## 41 47 43 38 34 49 73 52 24 23
## P06 P10 P11 P12 P13 P14 Y01 Y04
## 53 51 54 51 60 49 65 42
We have qc measures from the data
qc <- metadata[, !names(metadata) %in% c("Batch", "Experiment", "clusterLabels")]
head(qc, 2)
## NREADS NALIGNED RALIGN TOTAL_DUP PRIMER
## OEP01_N706_S501 3313260 3167600 95.6035 47.9943 0.0154566
## OEP01_N701_S501 2902430 2757790 95.0167 45.0150 0.0182066
## PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES
## OEP01_N706_S501 2e-06 0.200130 0.230654
## OEP01_N701_S501 0e+00 0.182461 0.201810
## PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES
## OEP01_N706_S501 0.404205 0.165009 0.430784
## OEP01_N701_S501 0.465702 0.150027 0.384271
## MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS
## OEP01_N706_S501 0.843857 0.061028 0.521079
## OEP01_N701_S501 0.914370 0.033350 0.373993
## CreER ERCC_reads
## OEP01_N706_S501 1 10516
## OEP01_N701_S501 3022 9331
clus.labels <- metadata[, "clusterLabels"]
In original work (FP: add ref), cells have been clustered into 14 different clusters
col_clus <- c("transparent", brewer.pal(12, "Set3"), brewer.pal(8, "Set2"))
col_clus <- col_clus[1:length(unique(clus.labels))]
names(col_clus) <- sort(unique(clus.labels))
table(clus.labels)
## clus.labels
## -2 1 2 3 4 5 7 8 9 10 11 12 14 15
## 233 91 25 56 40 96 60 28 79 26 22 35 26 32
Batches are kind of confounded with the biology
table(data.frame(batch = as.vector(batch),
cluster = clus.labels))
## cluster
## batch -2 1 2 3 4 5 7 8 9 10 11 12 14 15
## GBC08A 5 0 2 12 9 0 0 0 0 0 2 0 2 9
## GBC08B 14 0 7 5 3 0 0 0 1 2 4 0 5 6
## GBC09A 13 0 1 5 9 0 0 0 1 1 0 0 6 7
## GBC09B 21 0 2 2 7 0 0 0 3 0 0 0 3 0
## P01 9 0 2 4 3 15 1 0 0 0 0 0 0 0
## P02 6 2 0 9 3 15 3 3 2 3 0 2 1 0
## P03A 36 3 0 3 0 12 2 9 4 2 0 2 0 0
## P03B 19 1 2 1 1 11 1 2 10 1 1 2 0 0
## P04 10 0 0 0 0 11 1 0 1 1 0 0 0 0
## P05 3 0 0 0 1 11 3 0 1 0 2 2 0 0
## P06 14 1 2 3 0 8 2 4 8 4 1 2 2 2
## P10 15 3 1 4 0 4 5 9 2 0 2 5 0 1
## P11 10 2 1 1 0 1 5 1 22 3 1 6 0 1
## P12 11 0 2 0 0 4 10 0 8 2 3 6 4 1
## P13 13 1 2 4 0 4 15 0 4 5 6 1 3 2
## P14 10 0 0 1 2 0 12 0 12 2 0 7 0 3
## Y01 14 47 1 1 2 0 0 0 0 0 0 0 0 0
## Y04 10 31 0 1 0 0 0 0 0 0 0 0 0 0
We have 849 cells.
dim(core)
## [1] 1000 849
core[1:3, 1:3]
## OEP01_N706_S501 OEP01_N701_S501 OEP01_N707_S507
## Cbr2 5799 3638 1448
## Cyp2f2 2158 2027 1078
## Gstm1 8763 7221 3581
Let’s create a SummarizedExperiment object to store the raw counts and information about the data, that is batches, original labels, and quality control measures.
se <- SummarizedExperiment(assays = list(rawCounts = core),
colData = metadata)
To cluster and get lineages we want to reduce the dimension of the data. We are going to use zinbwave to do so. First, let’s fit zinbwave with first K = 0 to compute normalized values (i.e. deviance residuals) adjusted for batches. We could also adjust for gene length or GC content here. We then fit zinbwave to get the dimensionality reduced matrix W specifying the number of dimension K = 50. Eventually, we will call zinbwave just once where we would have an argument in zinbFit like “compute_normalized_values” in c(TRUE, FALSE). For K = 0 and K = 50, we correct for batch effect including batches in X.
fn0 <- '../data/zinb_k0_batch.rda'
if (runZinb & !file.exists(fn0)){
mod <- model.matrix( ~ batch)
print(system.time(zinb0 <- zinbFit(core, ncores = NCORES,
K = 0, X = mod)))
save(zinb0, file = fn0)
}else{
load(fn0)
}
fn50 <- '../data/zinb_k50_batch.rda'
if (runZinb & !file.exists(fn50)){
mod <- model.matrix( ~ batch)
print(system.time(zinb50 <- zinbFit(core, ncores = NCORES,
K = 50, X = mod)))
save(zinb50, file = fn50)
}else{
load(fn50)
}
We use deviance residuals as normalized values for visualization. FP: explain rational: K=0 so residuals capture the bio adjusting for batch. Let’s check that deviance residuals look ok.
res <- compute.deviance.residuals(core, zinb0)
res[1:3,1:3]
## OEP01_N706_S501 OEP01_N701_S501 OEP01_N707_S507
## Cbr2 4.541942 -4.431585 -4.239279
## Cyp2f2 4.316532 4.301158 -4.146744
## Gstm1 4.629884 4.583829 -4.429794
Boxplot of the normalized values for each cell. It seems that correction for batches is ok.
res_order <- res[, order(as.numeric(batch))]
col_order <- as.numeric(batch)[order(as.numeric(batch))]
boxplot(res_order, main='Boxplot of normalized values\ncolor=batch',
col = col_order, staplewex = 0, outline = 0, border = col_order, xaxt = 'n')
PCA on the normalized values where color are for batches on the left and previously found clusters on the right. We want no clustering on the left side and clustering on the right side.
pca <- prcomp(t(res))
par(mfrow = c(1,2))
plot(pca$x, col = col_batch[batch], pch = 20,
main="PCA of normalized values\ncolor=batch")
plot(pca$x, col = col_clus[as.character(clus.labels)], pch = 20,
main = "PCA of normalized values\ncolor=cluster")
par(mfrow = c(1,1))
Let’s add the normalized values as an assay dataset in our SummarizedExperiment object.
assays(se)[['normalizedValues']] <- res
Let’s check that performing MDS on W we have something coherent with original clusters.
W <- getW(zinb50)
d <- dist(W)
fit <- cmdscale(d, eig = TRUE, k = 2)
plot(fit$points, col = col_clus[as.character(clus.labels)], main = 'MDS', pch = 20,
xlab = 'Component 1', ylab = 'Component 2')
legend(x = 'bottomright', legend = unique(names(col_clus)), cex = .5,
fill = unique(col_clus), title = 'Sample')
Let’s add W to the colData of our SummarizedExperiment object.
W <- data.frame(W)
colnames(W) <- paste0('W', 1:ncol(W))
colData(se) <- cbind(colData(se), W)
FP: what do you think of implemented a “compute_normalized_values” argument to zinbFit for summarizedExperiment objects where zinbFit would take as input a SummarizedExperiment object and return a summarizedExperiment object with slots added for normalized values and W?
We use clusterExperiment with W.
FP: What about the following call to RSEC?
EP: I updated it to work on a SE object so that it has the meta data. If you have a summarized experiment object with W already, you could use that as long as assay(seObj) gives W.
W <- colData(se)[, grepl('^W', colnames(colData(se)))]
W <- as.matrix(W)
fn <- '../data/RSEC_W.rda'
if (runClus & !file.exists(fn)){
#symbol for samples missing from original clustering
metadata$clusterLabels[is.na(metadata$clusterLabels)]<- -2
seObj<-SummarizedExperiment(t(W),colData=metadata)
print(system.time(ceObj <- RSEC(seObj, k0s = 4:15, alphas = c(0.1),
betas = 0.8,
clusterFunction = "hierarchical01", minSizes=1,
ncores = NCORES, isCount=FALSE,
subsampleArgs = list(resamp.num=100,
clusterFunction="kmeans",
clusterArgs=list(nstart=10)),
seqArgs = list(k.min=3, top.can=5), verbose=TRUE,
combineProportion = 0.7,
mergeMethod = "none")))
save(ceObj, file = fn)
}else{
load(fn)
}
plotClusters(ceObj, colPalette = c(bigPalette, rainbow(199)))
plotCoClustering(ceObj)
## Warning in .makeColors(clusters, colors = bigPalette): too many clusters to
## have unique color assignments
table(primaryClusterNamed(ceObj))
##
## -1 c1 c2 c3 c4 c5 c6 c7 c8 c9
## 316 145 5 8 94 90 105 44 7 35
sum(primaryCluster(ceObj) == -1)
## [1] 316
FP: Elizabeth, we are working with the W here, does the locfdr make sense in this context? I set eval=FALSE in the next chunk to skip the merging step, let me know if you would rather keep using it. And if we want to still use the merging step, would we want to include it in RSEC function arguments instead of separately?
EP: I don’t think the merging step on the W makes a whole lot of sense – the method is irrelevant. The merging is based on calculating the % of genes found significant (the specific method is arbitrary). The best thing would be to replace the W with residuals in the assay of ceObj (or whatever data that you will do the DE on for the time stuff below), and then run the merging step on that data. I’m not particularly fond of locfdr. It was probably the method that gave the best merging to Russell and Diya. You’d really have to run mergeClusters setting plotInfo="all" and look at the results and decide both the cutoff level and the method.
EP: Also, if you don’t save the output of mergeClusters it doesn’t update ceObj. I was calling it for just the resulting plots, since it was already merged in RSEC above. I’ve changed to code to update ceObj below.
FP: Ha ok, good to know. I’ll keep the eval=FALSE for the moment.
#re-does merging simpling to make plot
#something like:
#assay(ceObj)
# if that replacement data should be considered on the transformed scale in plots, etc, the transformation function should be fixed as well:
#transformation(ceObj)
ceObj<-mergeClusters(ceObj, mergeMethod = "locfdr",
plotInfo = "mergeMethod", cutoff = 0.01)
FP: Not sure we should show the following heatmap using the W, we should rather use the normalized values (following heatmap).
plotHeatmap(ceObj, clusterSamplesData = "dendrogramValue",
breaks = .99)
So, let’s look at a heatmap on normalized values.
FP: Elizabeth, I did not find how to define the column annotation track in the plot below to have the same colors as in ceObj@clusterLegend[[1]]. I tried to use arguments annColors and annCol from aheatmap as it is said in plotHeatmap documentation that for signature matrix arguments can be passed to aheatmap. But I got the error “The following arguments to aheatmap cannot be set by the user in plotHeatmap:Rowv,Colv,color,annCol,annColors”.
EP: Fanny, you would need to use the argument ‘clusterLegend’. That argument takes either the format of aheatmap (list with each element a named vector of colors) or the format of the clusterExperiment object (i.e. list with each element a matrix with columns for name and color). So I think the following code will run, though it might need the list to have names…
But an easier fix to the code would be to set visualizeData option. I haven’t tested this because I don’t have the objects need run, so let me know if there is error.
FP: it seems great to me, what do you think?
# sampleData <- data.frame(ours = primaryCluster(ceObj))
# plotHeatmap(assays(se)$normalizedValues,
# main = 'Normalized values, 1000 most variable genes',
# clusterSamplesData = ceObj@dendro_samples,
# sampleData = as.matrix(sampleData),clusterLegend=ceObj@clusterLegend[1])
# easier fix:
plotHeatmap(ceObj, visualizeData = assays(se)$normalizedValues,
whichClusters = "primary",
clusterSamplesData = "dendrogramValue",
main = 'Normalized values, 1000 most variable genes',
breaks = 0.99)
plot(fit$points, col = col_clus[as.character(clus.labels)],
main = 'MDS W, color = original clusters', pch = 20,
xlab = 'Component1', ylab = 'Component2')
legend(x = 'bottomright', legend = unique(names(col_clus)), cex = .5,
fill = unique(col_clus), title = 'Sample')
palDF <- ceObj@clusterLegend[[1]]
pal <- palDF[, 'color']
names(pal) <- palDF[, 'name']
pal["-1"] = "transparent"
plot(fit$points, col = pal[primaryClusterNamed(ceObj)],
main = 'MDS W, color = our new clusters', pch = 20,
xlab = 'Component1', ylab = 'Component2')
legend(x = 'bottomright', legend = names(pal), cex = .5,
fill = pal, title = 'Sample')
The goal of this section is to see if we need to refit zinbwave when we want to run slingshot. We first run slingshot on the W used by clusterExperiment. In the second part of this section, we fit zinbwave on the matrix of counts where the unassigned cells have been removed. For each part (without or with refitting zinbwave), we run slingshot in the supervised and unsupervised mode and try k=3, k=4, k=5 dimensions in W.
From what I understand, start original clusters are 1 and 5 (HBC) and end original clusters are 15 (Microvillus), 9 and 12 (neuron), and 4, 7 (Sus). Additionally, we want the GBC cluster to be a junction before the differentiation between Microvillus and Neuron. The correspondance with the original clusters is as follow
table(data.frame(original = clus.labels, ours = primaryClusterNamed(ceObj)))
## ours
## original -1 c1 c2 c3 c4 c5 c6 c7 c8 c9
## -2 119 56 2 4 7 26 8 1 5 5
## 1 48 43 0 0 0 0 0 0 0 0
## 2 3 0 0 0 22 0 0 0 0 0
## 3 3 4 0 0 49 0 0 0 0 0
## 4 12 2 0 3 0 23 0 0 0 0
## 5 55 38 3 0 0 0 0 0 0 0
## 7 18 0 0 1 0 41 0 0 0 0
## 8 26 2 0 0 0 0 0 0 0 0
## 9 16 0 0 0 0 0 63 0 0 0
## 10 1 0 0 0 0 0 0 25 0 0
## 11 7 0 0 0 15 0 0 0 0 0
## 12 1 0 0 0 0 0 34 0 0 0
## 14 5 0 0 0 1 0 0 18 2 0
## 15 2 0 0 0 0 0 0 0 0 30
| Cluster name | Description | Correspondence | Color |
|---|---|---|---|
| c1 | HBC | original 1, 5 | darkblue |
| c2 | new and small | new and small | green |
| c3 | new and small | new and small | red |
| c4 | GBC / immature neurons / MV 1 | original 2, 3, 11, 14 | orange |
| c5 | Sus | original 4, 7 | darkpurple |
| c6 | Neuron | original 9, 12 | brown |
| c7 | Immature Neuron | original 10, 14 | cian |
| c8 | Immature Neuron | original 14 | lightpurple |
| c9 | Microvillus | original 15 | grayblue |
FP: We need to work on the cluster coloring (!) and the matching with the clusters you found for the paper with Russell, maybe by changing the parameters in clusterMany?
EP: The function plotClusters aligns the clusters and can return the aligned coloring between clusterings. There is an option to return an updated CE object that changes the colors in clusterLegend to match the new colors found by aligning the clusters (resetNames and resetColors arguments). These haven’t been extensively tested in real life because I don’t think we’ve used them much. But it would be good to demonstrate!
FP: With all the improvements made, I think we simply have the correct colors now.
Kvec <- c(3, 4, 5)
The input of slingshot is the W used for clusterExperiment where the number of dimensions is reduced to k where k in (3, 4, 5) here.
K = 3 does not seem very good to me: Sus is not an end cluster, GBC is an end cluster.
K = 4 is better, slingshot finds the end clusters but there is a spurious end cluster.
K = 5 does not seem great to me: GBC is an end cluster and Sus and Microvillus are in the same lineage.
our_cl <- primaryClusterNamed(ceObj)
cl = our_cl[our_cl != "-1"]
pal = pal[names(pal) != '-1']
for (k in Kvec){
X <- W[our_cl != "-1", 1:k]
lineages <- get_lineages(X, clus.labels = cl, start.clus = "c1")
curves <- get_curves(X, clus.labels = cl, lineages = lineages)
plot_curves(X, cl, curves, col.clus = pal)
plot_tree(X, cl, lineages, col.clus = pal)
print(paste0("K=", k))
print(lineages$lineage1)
print(lineages$lineage2)
print(lineages$lineage3)
print(lineages$lineage4)
print(lineages$lineage5)
}
## [1] "K=3"
## [1] "c1" "c2" "c3" "c5" "c9"
## [1] "c1" "c8" "c7" "c6"
## [1] "c1" "c8" "c4"
## NULL
## NULL
## [1] "K=4"
## [1] "c1" "c8" "c7" "c6"
## [1] "c1" "c8" "c4" "c9"
## [1] "c1" "c3" "c5"
## [1] "c1" "c2"
## NULL
## [1] "K=5"
## [1] "c1" "c2" "c3" "c5" "c9"
## [1] "c1" "c8" "c7" "c6"
## [1] "c1" "c8" "c4"
## NULL
## NULL
K = 3 finds GBC as an end cluster (that I did not specify in the end.clus!).
K = 4 Yeah! it seems that it is what we want even if we still have a spurius end cluster and GBC not really at the junction.
K = 5 Yeah! even if GBC not really at the junction.
for (k in Kvec){
X <- W[our_cl != "-1", 1:k]
lineages <- get_lineages(X, clus.labels = cl, start.clus = "c1",
end.clus = c("c5", "c6", "c9"))
curves <- get_curves(X, clus.labels = cl, lineages = lineages)
plot_curves(X, cl, curves, col.clus = pal)
plot_tree(X, cl, lineages, col.clus = pal)
print(paste0("K=", k))
print(lineages$lineage1)
print(lineages$lineage2)
print(lineages$lineage3)
print(lineages$lineage4)
print(lineages$lineage5)
}
## [1] "K=3"
## [1] "c1" "c2" "c3" "c5"
## [1] "c1" "c8" "c7" "c6"
## [1] "c1" "c8" "c4"
## [1] "c1" "c8" "c9"
## NULL
## [1] "K=4"
## [1] "c1" "c8" "c7" "c6"
## [1] "c1" "c8" "c4" "c9"
## [1] "c1" "c3" "c5"
## [1] "c1" "c2"
## NULL
## [1] "K=5"
## [1] "c1" "c2" "c3" "c5"
## [1] "c1" "c8" "c7" "c6"
## [1] "c1" "c8" "c4" "c9"
## NULL
## NULL
K = 3 is better than when we did not refit zinbwave but still not perfect: Sus in all the clusters. GBC not really at the junction.
K = 4 good even if GBC not really at the junction.
K = 5 not great GBC is an end cluster.
mod <- model.matrix(~ batch[our_cl != "-1"])
fn <- '../data/refit_zinbwave_slingshot.rda'
if (runZinb & !file.exists(fn)){
zinbList <- lapply(Kvec, function(k){
zinbFit(core[, our_cl != "-1"], X = mod, K = k, ncores = NCORES)
})
save(zinbList, file = fn)
}else{
load(fn)
}
for(k in Kvec) {
X <- getW(zinbList[[k - 2]])[, 1:k]
lineages <- get_lineages(X, clus.labels = cl, start.clus = "c1")
curves <- get_curves(X, clus.labels = cl, lineages = lineages)
plot_curves(X, cl, curves, col.clus = pal)
plot_tree(X, cl, lineages, col.clus = pal)
print(paste0("K=", k))
print(lineages$lineage1)
print(lineages$lineage2)
print(lineages$lineage3)
print(lineages$lineage4)
print(lineages$lineage5)
}
## [1] "K=3"
## [1] "c1" "c5" "c8" "c7" "c6"
## [1] "c1" "c5" "c8" "c4" "c9"
## [1] "c1" "c5" "c2"
## [1] "c1" "c5" "c3"
## NULL
## [1] "K=4"
## [1] "c1" "c2" "c3" "c5"
## [1] "c1" "c8" "c7" "c6"
## [1] "c1" "c8" "c4" "c9"
## NULL
## NULL
## [1] "K=5"
## [1] "c1" "c2" "c3" "c5"
## [1] "c1" "c8" "c4"
## [1] "c1" "c8" "c6"
## [1] "c1" "c8" "c7"
## [1] "c1" "c8" "c9"
K = 3 Yeah! close to perfection.
K = 4 good even if GBC not really at the junction.
K = 5 no, GBC is an end cluster.
for(k in Kvec){
X <- getW(zinbList[[k - 2]])[, 1:k]
lineages <- get_lineages(X, clus.labels = cl, start.clus = "c1",
end.clus = c("c5", "c6", "c9"))
curves <- get_curves(X, clus.labels = cl, lineages = lineages)
plot_curves(X, cl, curves, col.clus = pal)
plot_tree(X, cl, lineages, col.clus = pal)
print(paste0("K=", k))
print(lineages$lineage1)
print(lineages$lineage2)
print(lineages$lineage3)
print(lineages$lineage4)
print(lineages$lineage5)
}
## [1] "K=3"
## [1] "c1" "c4" "c8" "c7" "c6"
## [1] "c1" "c2" "c3" "c5"
## [1] "c1" "c4" "c9"
## NULL
## NULL
## [1] "K=4"
## [1] "c1" "c2" "c3" "c5"
## [1] "c1" "c8" "c7" "c6"
## [1] "c1" "c8" "c4" "c9"
## NULL
## NULL
## [1] "K=5"
## [1] "c1" "c2" "c3" "c5"
## [1] "c1" "c8" "c4"
## [1] "c1" "c8" "c6"
## [1] "c1" "c8" "c7"
## [1] "c1" "c8" "c9"
CONCLUSION: K = 5 is never great as GBC is generally an end cluster. K = 4 is ok for all the methods and a bit better when zinbwave is refitted. K = 3 when refitting and supervized is good.
It seems to me that using slingshot on W without re-fitting zinbwave with k = 4 gives good results where supervized mode is slightly better than unsupervized. It is just a one shot example and we should obviously not make a general conclusion, but I think that for the purpose of the workflow it is fine to use slingshot without refitting zinbwave. We should write a note to the user that it is better to refit zinbwave to have more power.
Here is the kind of plots we want to present
de <- read.csv('../data/oe_markers.txt', stringsAsFactors = F, header = F)
de <- de$V1
plotHeatmap(ceObj,
visualizeData = assays(se)$normalizedValues[rownames(se) %in% de, ],
clusterSamplesData = "dendrogramValue",
whichClusters = "primary",
main = 'Normalized values, 1000 most variable genes',
breaks = 0.99)
FP: Kelly, is it you who did the DE analysis for Russell paper? If yes, what tool did you use? On what data? The full quantile normalized counts? Do you have code available?
sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] rARPACK_0.11-0 digest_0.6.12
## [3] RColorBrewer_1.1-2 Rtsne_0.13
## [5] magrittr_1.5 gplots_3.0.1
## [7] ggplot2_2.2.1 slingshot_0.0.3-5
## [9] princurve_1.1-12 zinbwave_0.1.4
## [11] clusterExperiment_1.3.0-9008 scone_1.0.0
## [13] scRNAseq_1.2.0 SummarizedExperiment_1.6.3
## [15] DelayedArray_0.2.4 matrixStats_0.52.2
## [17] Biobase_2.36.2 GenomicRanges_1.28.3
## [19] GenomeInfoDb_1.12.1 IRanges_2.10.2
## [21] S4Vectors_0.14.2 BiocGenerics_0.22.0
##
## loaded via a namespace (and not attached):
## [1] shinydashboard_0.6.0 R.utils_2.5.0
## [3] RSQLite_1.1-2 AnnotationDbi_1.38.0
## [5] htmlwidgets_0.8 grid_3.4.0
## [7] trimcluster_0.1-2 BiocParallel_1.10.1
## [9] RNeXML_2.0.7 DESeq_1.28.0
## [11] munsell_0.4.3 codetools_0.2-15
## [13] statmod_1.4.29 scran_1.4.4
## [15] DT_0.2 miniUI_0.1.1
## [17] colorspace_1.3-2 energy_1.7-0
## [19] knitr_1.16 uuid_0.1-2
## [21] pspline_1.0-17 robustbase_0.92-7
## [23] bayesm_3.0-2 NMF_0.20.6
## [25] tximport_1.4.0 GenomeInfoDbData_0.99.0
## [27] hwriter_1.3.2 rhdf5_2.20.0
## [29] rprojroot_1.2 EDASeq_2.10.0
## [31] diptest_0.75-7 R6_2.2.1
## [33] doParallel_1.0.10 ggbeeswarm_0.5.3
## [35] taxize_0.8.4 locfit_1.5-9.1
## [37] flexmix_2.3-14 bitops_1.0-6
## [39] reshape_0.8.6 assertthat_0.2.0
## [41] scales_0.4.1 nnet_7.3-12
## [43] beeswarm_0.2.3 gtable_0.2.0
## [45] phylobase_0.8.4 RUVSeq_1.10.0
## [47] bold_0.4.0 rlang_0.1.1
## [49] genefilter_1.58.1 splines_3.4.0
## [51] rtracklayer_1.36.3 lazyeval_0.2.0
## [53] hexbin_1.27.1 rgl_0.98.1
## [55] yaml_2.1.14 reshape2_1.4.2
## [57] abind_1.4-5 GenomicFeatures_1.28.1
## [59] backports_1.1.0 httpuv_1.3.3
## [61] tensorA_0.36 tools_3.4.0
## [63] gridBase_0.4-7 stabledist_0.7-1
## [65] dynamicTreeCut_1.63-1 Rcpp_0.12.11
## [67] plyr_1.8.4 visNetwork_1.0.3
## [69] progress_1.1.2 zlibbioc_1.22.0
## [71] purrr_0.2.2.2 RCurl_1.95-4.8
## [73] prettyunits_1.0.2 viridis_0.4.0
## [75] zoo_1.8-0 cluster_2.0.6
## [77] data.table_1.10.4 RSpectra_0.12-0
## [79] mvtnorm_1.0-6 whisker_0.3-2
## [81] gsl_1.9-10.3 aroma.light_3.6.0
## [83] mime_0.5 evaluate_0.10
## [85] xtable_1.8-2 XML_3.98-1.7
## [87] mclust_5.3 gridExtra_2.2.1
## [89] compiler_3.4.0 biomaRt_2.32.0
## [91] scater_1.4.0 tibble_1.3.3
## [93] KernSmooth_2.23-15 R.oo_1.21.0
## [95] htmltools_0.3.6 pcaPP_1.9-61
## [97] segmented_0.5-2.0 tidyr_0.6.3
## [99] geneplotter_1.54.0 howmany_0.3-1
## [101] DBI_0.6-1 MASS_7.3-47
## [103] fpc_2.1-10 MAST_1.2.1
## [105] boot_1.3-19 compositions_1.40-1
## [107] ShortRead_1.34.0 Matrix_1.2-10
## [109] ade4_1.7-6 R.methodsS3_1.7.1
## [111] gdata_2.17.0 igraph_1.0.1
## [113] rncl_0.8.2 GenomicAlignments_1.12.1
## [115] registry_0.3 numDeriv_2016.8-1
## [117] locfdr_1.1-8 plotly_4.7.0
## [119] xml2_1.1.1 foreach_1.4.3
## [121] annotate_1.54.0 vipor_0.4.5
## [123] rngtools_1.2.4 pkgmaker_0.22
## [125] XVector_0.16.0 stringr_1.2.0
## [127] copula_0.999-16 ADGofTest_0.3
## [129] softImpute_1.4 Biostrings_2.44.0
## [131] rmarkdown_1.5 dendextend_1.5.2
## [133] edgeR_3.18.1 kernlab_0.9-25
## [135] shiny_1.0.3 Rsamtools_1.28.0
## [137] gtools_3.5.0 modeltools_0.2-21
## [139] rjson_0.2.15 nlme_3.1-131
## [141] jsonlite_1.4 viridisLite_0.2.0
## [143] limma_3.32.2 lattice_0.20-35
## [145] httr_1.2.1 DEoptimR_1.0-8
## [147] survival_2.41-3 FNN_1.1
## [149] prabclus_2.2-6 iterators_1.0.8
## [151] glmnet_2.0-10 class_7.3-14
## [153] stringi_1.1.5 mixtools_1.1.0
## [155] latticeExtra_0.6-28 caTools_1.17.1
## [157] memoise_1.1.0 dplyr_0.5.0
## [159] ape_4.1